Journal: Communications Biology
Article Title: Cross-platform normalization enables machine learning model training on microarray and RNA-seq data simultaneously
doi: 10.1038/s42003-023-04588-6
Figure Lengend Snippet: a TCGA matched samples (520 from BRCA, 150 from GBM) run on both microarray and RNA-seq were split into a training set (2/3) and test set (1/3). b RNA-seq samples were titrated into each training set, 10% at a time (0–100%), resulting in eleven training sets for each normalization method. Each RNA-seq sample replaces its matched microarray sample. Cross-platform normalization methods were applied to each training set independently. c We used three supervised algorithms to train classifiers (molecular subtype and mutation status of TP53 and PIK3CA in both BRCA and GBM) on each training set and tested on the microarray and RNA-seq test sets. The test sets were projected onto and back out of the training set space using unsupervised Principal Components Analysis to obtain reconstructed test sets. The subtype classifiers trained in step 3A were used to predict on the reconstructed test sets. Pathways regulating gene expression were identified using the unsupervised method PLIER.
Article Snippet: For BRCA (520 pairs of matched samples), we used log 2 -transformed, lowess normalized Agilent 244 K microarray data and RSEM (RNA-seq by Expectation Maximization) gene-level count RNA-seq data .
Techniques: Microarray, RNA Sequencing Assay, Mutagenesis, Expressing